166
Applications in Computer Vision
FIGURE 6.8
(a) and (b) illustrate the distribution of the unbinarized weights wi of the 6-th 1-bit layer
in 1-bit PointNet backbone when trained under XNOR-Net and our POEM, respectively.
From left to right, we report the weight distribution of initialization, 40-th, 80-th, 120-th,
160-th, and 200-th epoch. Our POEM obtains an apparent bimodal distribution, which is
much more robust.
Weight distribution: The POEM-based model is based on an Expectation-Maximization
process implemented in PyTorch [186] platform. We compare the weight distribution of
training XNOR-Net and POEM, which can subtly confirm our motivation. For a 1-bit
PointNet model, we analyze the 6-th 1-bit layer sized (64, 64) and having 4096 elements.
We plot its weight distribution at the {0, 40, 60, 120, 160, 200}-th epochs. Figure 6.8 shows
that the initialization (0-th epoch) is the same for XNOR-Net and POEM. However, our
POEM efficiently employs the Expectation-Maximization algorithm to supervise the back-
propagation process, leading to an effective and robust bimodal distribution. This analysis
also complies with the performance comparison in Table 6.5.
6.4
LWS-Det: Layer-Wise Search for 1-bit Detectors
The performance of 1-bit detectors typically degrades to the point where they are not widely
deployed on real-world embedded devices. For example, BiDet [240] only achieves 13.2%
mAP@[.5, .95] on the COCO minival dataset [145], resulting in an accuracy gap of 10.0%
below its real value counterpart (on the SSD300 framework). The reason, we believe, lies in
the fact that the layer-wise binarization error significantly affects 1-bit detector learning.
TABLE 6.3
The effects of different components of POEM on OA.
1-bit PointNet
OA (%)
XNOR-Net
81.9
Proposed baseline network
83.1
Proposed baseline network + PReLU
85.0
Proposed baseline network + EM
86.2
Proposed baseline network + LSF
86.5
Proposed baseline network + PReLU + EM + LSF (POEM)
90.2
Real-valued Counterpart
89.2
Note: PReLU, EM, and LSF denote components that are introduced into our proposed
baseline network. The proposed baseline network + PReLU + EM + LSF denotes the
POEM we propose. LSF denotes the learnable scale factor, in short.